{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 企业级问答系统 —— 离线知识生产流程\n",
    "\n",
    "## 一、背景介绍\n",
    "\n",
    "智能问答系统是当前应用范围最广，最容易落地的大模型应用。通常来说，智能问答系统包括以下五部分：\n",
    "1. Query理解\n",
    "2. 混合检索召回\n",
    "3. 语义排序\n",
    "4. 答案生成\n",
    "5. 追问生成\n",
    "\n",
    "AppBuilder在问答系统沉淀了最佳实践流程，并以极易用的组件化的方式对外开放，可以使用页面操作 + 低代码的方式快速搭建自己的问答系统。\n",
    "\n",
    "<img src=\"./qa_system/qa_flow.png\" alt=\"drawing\" width=\"1000\"/>\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 二、实操流程\n",
    "\n",
    "整体使用流程包括以下三个环节：\n",
    "\n",
    "1. 登录[百度智能云千帆AppBuilder官网](https://cloud.baidu.com/product/AppBuilder)创建账户，\n",
    "\n",
    "<img src=\"./qa_system/product.png\" alt=\"drawing\" width=\"1000\"/>\n",
    "\n",
    "\n",
    "2. 在[百度智能云千帆AppBuilder控制台](https://console.bce.baidu.com/ai_apaas/dialogHome)右侧头像栏『API密钥』页面获取密钥，并复制。\n",
    "\n",
    "<img src=\"./qa_system/token.jpg\" alt=\"drawing\" width=\"1000\"/>\n",
    "\n",
    "\n",
    "<img src=\"./qa_system/get_token.png\" alt=\"drawing\" width=\"1000\"/>\n",
    "\n",
    "\n",
    "3. 引用AppBuilder SDK代码，调用Dataset API相关接口，创建数据集，并增、查、删除文档。"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 环境准备\n",
    "\n",
    "首先需要安装AppBuilder-SDK代码库，若已在开发环境安装，则可跳过此步。\n",
    "\n",
    "**注意：**: appbuilder-sdk 的python版本要求 3.9+"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Defaulting to user installation because normal site-packages is not writeable\n",
      "Collecting appbuilder-sdk\n",
      "  Using cached appbuilder_sdk-0.6.0-py3-none-any.whl.metadata (768 bytes)\n",
      "Requirement already satisfied: requests in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (2.31.0)\n",
      "Requirement already satisfied: proto-plus==1.22.3 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (1.22.3)\n",
      "Requirement already satisfied: pydantic==2.6.4 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (2.6.4)\n",
      "Requirement already satisfied: numpy in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (1.24.4)\n",
      "Requirement already satisfied: SQLAlchemy~=1.4.50 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (1.4.52)\n",
      "Requirement already satisfied: urllib3<2.0.0 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (1.26.18)\n",
      "Requirement already satisfied: tenacity in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (8.2.3)\n",
      "Requirement already satisfied: pandas in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (2.2.1)\n",
      "Requirement already satisfied: openpyxl in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (3.1.2)\n",
      "Requirement already satisfied: pymochow>=1.1.2 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (1.1.2)\n",
      "Requirement already satisfied: parameterized in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from appbuilder-sdk) (0.9.0)\n",
      "Requirement already satisfied: protobuf<5.0.0dev,>=3.19.0 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from proto-plus==1.22.3->appbuilder-sdk) (4.25.3)\n",
      "Requirement already satisfied: annotated-types>=0.4.0 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pydantic==2.6.4->appbuilder-sdk) (0.6.0)\n",
      "Requirement already satisfied: pydantic-core==2.16.3 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pydantic==2.6.4->appbuilder-sdk) (2.16.3)\n",
      "Requirement already satisfied: typing-extensions>=4.6.1 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pydantic==2.6.4->appbuilder-sdk) (4.10.0)\n",
      "Requirement already satisfied: orjson in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pymochow>=1.1.2->appbuilder-sdk) (3.9.15)\n",
      "Requirement already satisfied: future in /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/site-packages (from pymochow>=1.1.2->appbuilder-sdk) (0.18.2)\n",
      "Requirement already satisfied: et-xmlfile in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from openpyxl->appbuilder-sdk) (1.1.0)\n",
      "Requirement already satisfied: python-dateutil>=2.8.2 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pandas->appbuilder-sdk) (2.8.2)\n",
      "Requirement already satisfied: pytz>=2020.1 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pandas->appbuilder-sdk) (2024.1)\n",
      "Requirement already satisfied: tzdata>=2022.7 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from pandas->appbuilder-sdk) (2024.1)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from requests->appbuilder-sdk) (3.3.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from requests->appbuilder-sdk) (3.6)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from requests->appbuilder-sdk) (2024.2.2)\n",
      "Requirement already satisfied: six>=1.5 in /Users/chengmo/Library/Python/3.9/lib/python/site-packages (from python-dateutil>=2.8.2->pandas->appbuilder-sdk) (1.16.0)\n",
      "Using cached appbuilder_sdk-0.6.0-py3-none-any.whl (335 kB)\n",
      "Installing collected packages: appbuilder-sdk\n",
      "Successfully installed appbuilder-sdk-0.6.0\n"
     ]
    }
   ],
   "source": [
    "!python3 -m pip install appbuilder-sdk"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 引入AppBuilder-SDK，并设置TOKEN\n",
    "\n",
    "**注意：** 请使用刚才在AppBuilder平台上新建的个人TOKEN"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "AppBuilder 模块导入成功！\n",
      "您的AppBuilder Token为：your_appbuilder_token\n"
     ]
    }
   ],
   "source": [
    "# 引入os模块，引入appbuilder 模块\n",
    "import os\n",
    "import appbuilder\n",
    "\n",
    "# 设置appbuilder的token密钥，从页面上复制粘贴token，覆盖此处的 \"your_appbuilder_token\"\n",
    "os.environ['APPBUILDER_TOKEN'] = \"your_appbuilder_token\"\n",
    "\n",
    "print(\"AppBuilder 模块导入成功！\")\n",
    "print(\"您的AppBuilder Token为：{}\".format(os.environ['APPBUILDER_TOKEN']))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.3 创建全新知识库\n",
    "\n",
    "我们从零开始创建一个新的知识库，并使用它来支持问答系统。我们将其命名为**说明书知识库**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "知识库创建成功！ 知识库的ID为：c368a982-699f-4114-9e9a-c08b76c07f92， 知识库的名称为： 说明书知识库\n"
     ]
    }
   ],
   "source": [
    "# 创建全新知识库\n",
    "dataset = appbuilder.console.Dataset.create_dataset(\"说明书知识库\")\n",
    "\n",
    "print(\"知识库创建成功！ 知识库的ID为：{}， 知识库的名称为： {}\".format(\n",
    "    dataset.dataset_id,\n",
    "    dataset.dataset_name))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4 上传文档到知识库\n",
    "\n",
    "在notebook的同级目录`./qa_system`下，我们提前准备了两篇文档：\n",
    "- `./qa_system/平板电脑说明书.pdf`\n",
    "- `./qa_system/显示器说明书.pdf`\n",
    "\n",
    "我们将这两篇文档上传到知识库中，用于后续问答应用的私域知识检索。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "文档上传到知识库成功！知识库中的文档ID为：['c396c715-66d8-49cb-af9d-af0bd08a16e2', '817d29bb-be72-420b-bee1-c1d9b1116826']\n"
     ]
    }
   ],
   "source": [
    "file_1 = \"./qa_system/平板电脑说明书.pdf\"\n",
    "file_2 = \"./qa_system/显示器说明书.pdf\"\n",
    "\n",
    "document_infos = dataset.add_documents([\n",
    "    file_1, file_2\n",
    "])\n",
    "\n",
    "print(\"文档上传到知识库成功！知识库中的文档ID为：{}\".format(document_infos.document_ids))\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.5 查找知识库中的文档\n",
    "\n",
    "问答系统中的知识库，常用增、删、查操作，我们刚才演示了增加文档的方法，下面演示如何查找文档。\n",
    "\n",
    "我们获取[百度智能云千帆AppBuilder控制台-我的知识](https://console.bce.baidu.com/ai_apaas/dataset)页面中，我们刚才创建的知识库中的文档列表。\n",
    "\n",
    "<img src=\"./qa_system/doc_list.png\" alt=\"drawing\" width=\"1000\"/>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "文档列表获取成功！\n",
      "第二个的文档是: 平板电脑说明书.pdf, 文档ID是: c396c715-66d8-49cb-af9d-af0bd08a16e2, 文档字数是: 8568\n"
     ]
    }
   ],
   "source": [
    "doc_list = dataset.get_documents(page=1,limit=10)\n",
    "print(\"文档列表获取成功！\")\n",
    "print(\"第二个的文档是: {}, 文档ID是: {}, 文档字数是: {}\".format(\n",
    "    doc_list.data[1].name, doc_list.data[1].id, doc_list.data[1].word_count))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.6 删除知识库中的文档\n",
    "\n",
    "下面我们演示如何删除知识库中的文档。我们尝试删除 『平板电脑说明书.pdf』 这篇文档"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "文档: 平板电脑说明书.pdf, 文档ID: c396c715-66d8-49cb-af9d-af0bd08a16e2 删除成功!\n",
      "文档删除成功，请刷新『知识库』前端页面，观察是否已无『平板电脑说明书.pdf』文档\n"
     ]
    }
   ],
   "source": [
    "need_delete_doc_name = \"平板电脑说明书.pdf\"\n",
    "need_delete_doc_id = \"\"\n",
    "\n",
    "doc_list = dataset.get_documents(page=1,limit=10)\n",
    "for doc in doc_list.data:\n",
    "    if doc.name == need_delete_doc_name:\n",
    "        need_delete_doc_id = doc.id\n",
    "        break\n",
    "\n",
    "if need_delete_doc_id == \"\":\n",
    "    print(\"未找到指定文档，请检查 2.4 步骤是否正确上传，以及2.5步骤中是否展示了正确的文档列表\")\n",
    "else:\n",
    "    dataset.delete_documents([need_delete_doc_id])\n",
    "    print(\"文档: {}, 文档ID: {} 删除成功!\".format(\n",
    "        need_delete_doc_name,\n",
    "        need_delete_doc_id\n",
    "    ))\n",
    "\n",
    "print(\"文档删除成功，请刷新『知识库』前端页面，观察是否已无『{}』文档\".format(need_delete_doc_name))\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "执行完上述步骤后，前端页面刷新，指定文档已经被成功删除\n",
    "\n",
    "<img src=\"./qa_system/delete_doc.png\" alt=\"drawing\" width=\"1000\"/>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "当前创建的知识库的ID为：c368a982-699f-4114-9e9a-c08b76c07f92， 知识库的名称为： 说明书知识库\n"
     ]
    }
   ],
   "source": [
    "# 再一次确认当前知识库的 Name 与 ID，方便后续 企业级问答系统 —— 在线问答流程 的使用\n",
    "\n",
    "print(\"当前创建的知识库的ID为：{}， 知识库的名称为： {}\".format(\n",
    "    dataset.dataset_id,\n",
    "    dataset.dataset_name))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 三、总结\n",
    "\n",
    "AppBuilder 为 基于RAG的 企业问答系统提供了 `appbuilder.console.dataset` API，可以方便的以代码态进行知识库的管理，实现基本的增删改查业务需求。\n",
    "\n",
    "本示例展示了如何从零创建知识库，并实现增、查、改的代码，并可以观察SDK与前端页面的联动。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
